DEPARTMENT  OF  DEFENCE 


An  Expectation-Maximization 
Approach  to  Spectral  Structure 
Identification  and  Estimation 

Marian  Viola,  Alan  Bolton  and 
Bill  Moran 


DSTO-RR-0207 


DISTRIBUTION  STATEMENT  A 

Approved  for  Public  Release 
Distribution  Unlimited 


An  Expectation-Maximization  Approach  to  Spectral 
Structure  Identification  and  Estimation 

Marian  Viola,  Alan  Bolton  and  Bill  Moran 


Surveillance  Systems  Division 
Electronics  and  Surveillance  Research  Laboratory 


DSTC)-RR-0207 


ABSTRACT 

The  method  of  Expectation-Maximization  is  applied  to  find  an  approximate  maximum- 
likelihood  estimate  of  the  amplitudes,  phases  and  frequency  shift  of  a  known  collec¬ 
tion  of  spectral  lines  in  a  signal  with  Gaussian  noise.  The  technique  works  well  for 
many  frequency  components  provided  that  these  frequencies  are  well  separated.  It  is 
applied  to  the  identification  of  the  frequencies  present  in  a  radar  return  from  a  pro¬ 
peller  driven  aircraft,  arising  from  modulation  by  blades  of  the  propellers. 


APPROVED  FOR  PUBLIC  RELEASE 

fPEPARTMENT  OF  D  E  F  E  N  C  E  I  1%  QX  A 

DEFEICE  SCIENCE  t  TECHKOLOCY  OICAHISHTIOII  I  V 


/IQ  Toa- 0^7557 


DSTO-RR-()2()7 


Published  by 

DSTO  Electronics  and  Sun>eillance  Research  Laboratory 
PO  Box  1500 

Salisbury,  South  Australia,  Australia  5108 

Telephone:  (08)  8259  5555 

Facsimile:  (08)  8259  6567 

(c)  Commonwealth  of  Australia  2001 
ARNo.  AR  01 1-791 
October,  2001 


APPROVED  FOR  PUBLIC  RELEASE 


DSTO-RR-0207 


An  Expectation-Maximization  Approach  to  Spectral  Structure 
Identification  and  Estimation 


EXECUTIVE  SUMMARY 

Many  problems  in  signal  processing  require  the  estimation  of  the  significant  frequency  com¬ 
ponents  of  a  signal;  that  is,  of  the  frequency,  amplitude  and  phase.  In  particular  such  problems 
arise  in  estimating  the  spectral  content  of  returns  from  aircraft  where  discrete  frequency  compo¬ 
nents  may  arise  from  modulation  by  the  engines.  Good  estimates  of  these  components  are  useful 
in  identification  of  the  aircraft  type  from  the  radar  return. 

Here  is  presented  a  method  of  estimation  of  discrete  frequency  (line)  components  where  the 
inter-spacing  is  known  but  the  absolute  positions  of  the  lines  are  unknown,  as  would  be  the  case 
when  an  aircraft  return  had  unknown  doppler.  Also  unknown  are  the  amplitude  and  phase  of  the 
lines.  These  parameters  are  estimated  by  a  technique  based  on  the  Expectation-Maximization 
algorithm  which,  iteratively,  produces  good  approximations  to  the  maximum  likelihood  estimate. 

The  method  is  found  to  work  extremely  well  with  synthetic  data  with  up  to  six  line  components 
and  large  amounts  of  added  noise.  When  tested  on  real  signals,  it  is  able  to  handle  up  to  21  line 
components. 
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1  Introduction 

In  many  radar  signal  processing  problems  a  signal  with  known  spectral  structure  needs  to  be 
identified  and  its  other  parameters  such  as  amplitudes  and  phases  of  the  individual  lines  and  an 
overall  frequency  shift  (due  to  doppler)  estimated.  The  aim  of  this  paper  is  to  describe  an  algorithm 
for  extracting  this  information  from  a  signal  given  a  knowledge  of  the  relative  frequencies  of  its 
line  components. 

Radar  returns  from  propeller  driven  aircraft  contain  signal  components  due  to  reflections  from 
the  blades  of  the  propellers.  These  components  are  manifested,  in  part,  by  modulation  of  the  re¬ 
flected  rf  signal  by  a  number  of  discrete  frequencies.  At  long  ranges,  the  radar  returns  themselves 
are  significantly  contaminated  by  noise.  Consequently,  standard  signal  processing  methods  have 
not  been  easily  able  to  extract  the  frequency  components  and  give  good  estimates  of  their  phases 
and  amplitudes. 

The  underlying  frequency  components  in  returns  from  a  propeller  driven  aircraft  are  modified 
both  by  the  effects  of  doppler  and  by  variations  in  the  shaft  rate  of  the  engine.  The  former,  under 
the  narrow  band  approximation,  which  is  valid  here,  shifts  the  frequencies  while  the  latter  scales 
them.  Both  of  these  effects  need  to  be  taken  into  account  in  any  method  that  attempts  to  identify 
aircraft  by  these  means.  However  for  the  purposes  of  this  paper  we  restrict  attention  only  to 
frequency  shift,  leaving  scaling  to  a  future  publication. 

We  describe  here  a  method  based  on  the  Expectation-Maximization  (EM)  algorithm  of  Demp¬ 
ster  et.  al,  ([!]),  for  extracting  from  a  signal  the  amplitudes  and  phases  of  a  given  set  of  frequency 
components  as  well  as  their  frequency  offset.  The  elaboration  of  the  algorithm  in  [2]  is  closely 
followed. 

The  method  then,  begins  with  the  description  of  a  signal  model  comprising  a  small  number 
of  pure  frequency  components,  but  of  unknown  phase  and  amplitude.  In  addition  the  frequencies 
are  only  known  relative  to  each  other  and  not  absolutely.  White  noise  of  unknown  variance  is 
added  to  this  signal.  To  be  specific,  we  assume  a  complex  signal  s{t)  with  known  frequencies 
III  f2i  ) /t?)  “in  unknown  frequency  shift  T,  unknown  phases  ,  ^2)  ,  <^/j,  and  unknown 

amplitudes  (9i,  6*2,  ...  ,9r.  Thus  the  signal  is 

s(i)  =  +  w(i)  (1) 

r~l 

where  w{t)  is  complex  Gaussian  white  noise  with  variance  o^,  which  is  also  assumed  unknown. 
We  write 

0=(01,  02,  ...  ,0^), 

^  —  (01,  02,  ...  ,  0ft), 

f  =  (/l,  /2,  ...  ,  /ft). 

Our  aim,  as  already  indicated,  is  to  estimate  the  various  unknown  parameters  of  the  signal 
from  a  data  record,  S  =  A  standard  maximum-likelihood  approach  appears  not  to  be 

practical,  as  there  is  no  closed  form  solution  to  the  optimization  problem.  Instead  a  modified  form 
of  the  EM  algorithm  is  used.  The  EM  algorithm  is  known,  in  many  situations,  to  provide  good 
convergence  to  the  maximum  likelihood  estimate. 
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In  this  report  we  show  how  the  EM  algorithm  is  applied  the  problem  described  above  and 
obtain  results  for  simulated  data  with  up  to  6  different  frequency  components.  Two  variations  on 
the  EM  technique  with  slightly  different  update  rules  in  the  iteration  process  are  examined.  Both 
of  these  are  found  to  work  well  with  relatively  high  levels  of  noise.  In  addition  we  show  how  the 
algorithm  works  with  a  typical  return  from  a  propeller  driven  aircraft.  There  reasonable  estimates 
for  over  20  frequency  components  are  obtained. 


2  Description  of  Basic  Algorithm 


The  probability  density  of  a  record  S  =  generated  by  a  signal  model  of  the  form 

(l)i.s 


M 


P0,<j>,T{S) 


7-1 


(\/27r(T'^i^^  I  2(7'^ 

From  this  we  observe  that  the  salient  terms  in  the  log-likelihood  of  this  record  arc 


(2) 


LM.4>,r) 


M  ..  1 


A/  R 


k=\ 


r-\ 


(3) 


Note  first  that,  once  0,  4>tT  are  known,  then  an  estimate  for  can  easily  be  found  by  maxi¬ 
mizing  the  log-likelihood  (3); 


a'^  = 


1/2 


k=\ 


(4) 


In  fact,  estimates  for  do  not  enter  into  the  algorithm  except  at  this  point.  Accordingly  this 
aspect  of  the  problem  will  be  ignored  from  now  on. 

To  find  the  estimates  of  the  other  parameters  the  EM  approach  as  described  in  [2]  is  followed. 
To  simplify  formulae,  note  that  the  non-constant  part  of  the  log-likelihood  is  (the  negative  of) 


I  l  \2 

k-\  r=\ 

The  key  ingredient  in  the  EM  approach  is  the  introduction  of  new  {unobservable)  random  variables 

where  the  random  variables  are  complex  Gaussian  of  mean  zero  and  variance  of..  The  vari¬ 
ances  a'l  are  chosen  to  sum  to  a^,  so  that 


M 


r=l 
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We  shall  say  more  later  about  how  these  are  chosen.  The  appropriate  terms  in  the  log-likelihood 
for  this  vector  of  random  variables  are  then 

r=l 


The  “expectation”  phase  of  the  EM  algorithm  is  accommodated  by  calculating  the  expectation 
of  this  random  variable  conditioned  on  the  random  vector  S  =  where  parameter  values 

6^,  rt*  are  the  values  assumed  in  the  distribution  for  S.  The  particular  values  of^,  </>,  r  that 
minimize  this  expression  are  found  and  are  used  as  updated  versions  of  0*',  t**  in  an  iterative 

scheme. 

The  conditional  expectation  operator  with  respect  to  S  under  the  distribution  with  the  param¬ 
eters  0^,  ri*  is  denoted  by  and  the  absolute  expectation  by  E.  Thus  we  need  to  consider 

R  .  M  ( 

£\GY{eA,r))  =  E-2  E +  ^2 

r=l  k^l\  ^ 

where  *  denotes  complex  conjugation.  The  first  term  under  the  inner  summation  sign  does  not 
vary  with  the  parameters  so  we  may  ignore  it  in  the  maximization  process.  Our  aim,  then,  is  to 
find  9,  (j),  T  to  maximize 


R 

r=l 

To  calculate  note  that  the  random  variables  involved  are  Gaussian  and  so 

sHyD  =  ^(yD  +  (s  -  £:(S))^  ^  (6) 

where  is  the  covariance  of  Y  with  S  and  is  the  auto-covariance  of  S.  The  latter  is  just 
cr^I,  and  the  {{k,r),k')  component  of  the  former  is  0  unless  k  =  k!]  when  A;  =  A;'  it  equals  cr^. 
This  gives 

E^iylY  =  0le-MGr+r»)k+4)  +  ^(^s(A;)*  _  ^  . 

q=l 

Substituting  in  (6),  we  find  that 

R  /  ^  M 

U{9,1,t)  =  E  +  (7) 

r=l  \  ^  fc=:l 

^(s{kY  -^9le-^^^iG,+ri)k+4)^\  _ 

q=l  / 


2iti[{fr+T)k+(pr)  _ 


Met 


(5) 


In  order  to  deal  with  this  expression,  we  define 

,.2 


Ofc,,  =  +^(^s{kr  - 

9=1 


(9) 
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and  let  A  be  the  matrix  Notice  that  all  of  this  expression  is  calculable  in  terms  of  an 

instantiation  S  -  (s(k))  of  the  random  vector  S  and  the  previous  values  of  the  parameters. 

Let 

_  27ri((/,+T)k+<pr) 

^k,7'  —  c  '  , 

and  let  C  denote  the  matrix  (ca,-,,  ).  Then  (7)  can  be  rewritten  as 


r=  1 


02 

2(7,2’ 


(10) 


(11) 


where  now  *  denotes  the  Hermitian  transpose. 

Once  (j)  and  r  are  known  then  it  is  easy  to  maximize  U  for  d,-  by  choosing 

The  values  of  cf)  and  r  are  obtained  by  maximizing 

"  t>. 

r^\ 

In  fact  we  use  the  prior  estimate  6^-  rather  than  0,.  here.  In  a  sense  this  departs  from  the  formalism 
of  the  EM-algorithm,  however  the  effect  does  not  seem  to  be  significant  and  simulations  indicate 
that  the  method  converges  in  practice.  It  is  difficult,  in  this  case,  to  see  how  the  strict  methodology 
of  the  EM-algorithm  could  be  followed.  Thus  we  maximize 


U  y 


"  ei 


Or 

r  =  l  ' 


(12) 


Let  us  write 

so  that  (12)  becomes 


dk,r  = 


R  .  M 

r=l  ^  A=1 


Given  the  correct  choice  of  r,  the  correct  choice  of  i/f  to  maximize  this  are 

M 


(13) 


(14) 


For  this  choice  of  (/v  the  sum  (13)  becomes 

Or 


;•=! 


A,-=l 


-2niTk 


The  problem  then  is  to  maximize  this  over  t.  Once  this  is  done  the  previous  equations  (14),  (11) 
can  be  used  to  update  the  remaining  parameters. 


The  maximization  over  r  is  just  an  exhaustive  search  over  a  range  of  possible  values  of  t. 
Since  it  is  anticipated  that  the  range  of  such  values  will  be  relatively  small  this  can  be  accom¬ 
plished  quickly  and  easily.  In  the  real  example  we  use  for  illustration,  an  initial  crude  guess  at  the 
frequency  shift  was  made  by  observing  the  EFT  This  limited  the  range  of  r  to  be  searched. 
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3  Choice  of 


In  the  algorithm  presented  in  the  previous  section,  the  only  constraint  imposed  in  the  choice 
of  the  variances  of  the  virtual  random  variables  yj  is  that  they  sum  to  To  implement  the 
algorithm  a  choice  of  these  parameters  is  necessary. 


We  have  experimented  with  three  different  choices.  The  first  and  most  obvious  choice  is  to 
make  the  all  equal  (and  therefore  equal  to  d^/R).  In  fact  this  gave  slow  and  often  poor  conver¬ 
gence  results  compared  with  the  other  two  methods,  especially  when  the  values  of  the  magnitudes 
dr  varied  significantly.  Such  behaviour  is  not  unexpected.  If  some  of  the  ^  are  small  then  this 
choice  of  ar  imposes  a  comparatively  large  noise  on  the  rth  signal  component.  Because  of  its  poor 
performance  we  do  not  report  on  simulations  with  this  choice  of  o^. 


Before  describing  the  second  choice  of  o^,  we  first  consider  a  definition  of  (denoted  by  o^r) 
that  involves  making  them  proportional  to  the  magnitude  of  the  corresponding  signal  component, 
that  is, 

4  _  O'r 

This,  of  course,  satisfies  the  sum  constraint.  In  fact  it  was  found  that  using  this  method  gave  rise 
to  problems  with  the  components  of  very  small  magnitude.  For  these,  the  amount  of  adjustment 
imposed  by  the  algorithm  from  one  iteration  to  the  next  was  too  small  and  so  a  lower  threshold 
was  imposed  on  the  size  of  the  o^.  To  be  precise,  for  the  second  choice  of  the  variances,  we  define 

^  =  max{atr,0.2* 

r~l 


and  then  normalize  to  obtain  o?: 


Er 
r=l  ^ 


As  we  shall  see  in  the  next  section  this  method  performs  quite  well  both  for  simulated  and  real 
data.  We  call  it  the  normalizing  method. 


A  third  choice,  which  works  surprisingly  well,  requires  relaxation  of  the  constraint  on  the  sum 
of  the  cr^’s.  In  fact  the  values  of  are  all  equal  at  each  iteration  and  are  of  the  form 


=  ^{'^  +  log{l+  p)), 

where  p  is  the  number  of  iterations  remaining.  This  resembles  the  cooling  schedule  of  optimization 
algorithms  such  as  simulated  annealing.  We  refer  to  this  as  the  annealing  method. 


4  Simulation  and  Results 


The  algorithm  works  well  on  simulated  data  comprising  up  to  6  frequencies,  as  we  shall  illus¬ 
trate  with  a  particular  example.  In  this  example,  the  length  of  a  data  record  is  1024,  the  frequencies 
used  are 


f  =  (/r)r=i  =  (1-1,  3.2,  4.8,  6.8,  8.9,  12.4), 
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and  the  corresponding  magnitudes  and  phases 

0^{0r)U^  =(C,  15,  8,  2,  7,  5), 

=  (0-25,  0.8,  0.4,  0.3,  O.G,  0.7). 

The  frequency  offset  r  was  chosen  to  be  equal  to  0.7.  Gaussian  noise  with  standard  deviation 
7  was  added  to  this  signal.  Figure  1  shows  the  magnitude  of  the  signal  with  noise  present.  The 


0  200  400  600  800  1000  1200 

Sample  Number 

Figure  I :  Magnitude  of  simulated  data 

algorithm  was  started  with  a  random  choice  of  each  of  the  parameters  to  be  estimated  and  run  for 
50  iterations.  The  next  two  figures  show  the  convergence  of  the  various  parameters  towards  the 
actual  values  (given  by  the  dashed  lines),  first  for  the  normalizing  method  (Figure  2)  and  then  for 
the  annealing  method  (Figure  3). 

Both  methods  show  good  convergence.  The  normalizing  method  converged  faster  on  the  com¬ 
ponent  with  largest  amplitude  but  less  quickly  on  the  remaining  amplitudes.  It  also  tended  to  give 
slightly  less  accurate  answers  for  both  the  amplitudes  and  phases.  In  both  cases,  but  especially  in 
the  normalizing  case,  the  phase  estimation  was  less  accurate  when  the  magnitude  of  that  compo¬ 
nent  was  smaller.  Using  both  methods  the  phase  and  frequency  offset  estimates  tended  to  converge 
rapidly. 

It  was  found  that  the  algorithm  worked  better  when  the  frequencies  were  well  spaced.  The 
presence  of  two  close  frequencies  would,  at  least,  slow  the  convergence  and  sometimes  even  result 
in  instability  of  the  algorithm.  As  a  result  the  algorithm  did  not  perform  well  on  simulated  data  of 
this  length  with  more  than  about  6  frequencies. 
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The  algorithm  was  tested  with  four  frequencies  and  varying  amounts  of  noise  up  to  a  standard 
deviation  of  about  100.  While  the  estimates  decreased  in  accuracy  as  the  noise  level  increased,  the 
algorithm  continued  to  converge  and  to  provide  estimates  that  remained  useful. 


5  Test  on  Real  Data 


We  now  demonstrate  the  capability  of  the  algorithm  on  some  real  radar  returns  from  a  propeller 
driven  aircraft.  Figure  4  gives  the  magnitude  of  the  FFT  of  the  original  time  domain  data. 


10“ 


frequency  (kHz) 


Figure  4:  Real  Data:  Frequency  Domain 

As  can  be  seen,  the  data  contains  a  number  of  significant  frequency  components  centred  close 
to  427.  We  chose  this  number  as  an  initial  guess  of  the  frequency  shift  so  as  to  reduce  the  search 
range  for  t.  Thus  the  initial  choice  of  frequencies  were  9  equally  spaced  components,  centred  on 
427  with  spacings  of  28.31 : 

f  =  (/^)  =  (-4,  -3,  -2,  -1, 0, 1, 2,  3, 4)  X  28.31  +  427. 

It  should  be  noted  that,  in  this  data,  there  was  no  prior  knowledge  of  the  doppler  shift,  so  that,  to 
test  the  accuracy  of  the  method,  further  frequency  shifts  were  imposed.  The  data  used  to  provide 
the  convergence  graphs  had  a  shift  of  1.52  over  the  initial  data. 

Figure  5  shows  the  convergence  of  the  various  parameters  over  12  iterations  of  the  normalizing 
form  of  the  algorithm.  The  annealing  form  of  the  algorithm  was  also  applied  to  the  data  for  1 2 
iterations.  It  yielded  the  results  illustrated  in  Figure  6. 
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(a)  Magnitude 


(b)  Phase 


(c)  Frequency  Offset 


Fif’ure  5;  Convergence  of  parameter  estimates  using  normalizing  algorithm 
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Both  algorithms  yielded  good  results;  the  annealing  method  converging  more  slowly,  in  this 
case,  than  the  normalizing  one.  What  is,  perhaps,  significant  here  is  that  in  this  data  there  are 
many  other  significant  frequency  components,  so  that  the  data  no  longer  conforms  even  crudely 
to  the  model.  Nonetheless  it  was  able  using  both  methods  to  give  a  good  estimate  for  the  various 
parameters.  As  already  stated  we  tested  the  algorithm  with  various  artificial  frequency  offsets 
imposed.  The  results  showed  remarkable  consistency,  agreeing  to  within  0.006.  We  have  obtained 
similar  results  with  as  many  as  21  frequency  components  in  this  data. 


6  Conclusion 

We  have  described  two  versions  of  a  method  of  estimating  the  magnitude,  phase  and  frequency 
offset  of  a  number  of  spectral  lines  of  known  relative  frequency  in  a  signal  with  Gaussian  noise. 
The  method  is  based  on  the  expectation-maximization  algorithm,  and  the  two  versions  correspond 
to  different  ways  of  assigning  the  variances  of  the  unobservable  random  variables. 

Both  methods  work  well  for  simulated  data  comprising  up  to  six  frequencies  in  relatively  high 
levels  of  noise,  and  for  over  20  frequencies  from  a  complicated  real  signal  with  many  line  com¬ 
ponents  in  noise.  It  is  difficult  to  compare  the  relative  merits  of  the  two  methods  of  choosing  the 
variances  of  the  unobservable  random  variables.  As  a  crude  generalization,  the  annealing  method 
slightly  outperformed  the  normalizing  method  in  speed  of  convergence  while  the  normalizing 
method  gave  slightly  more  accurate  estimates.  However  the  results  were  very  data  dependent.  The 
convergence  rates  for  individual  lines  were  highly  dependent  on  the  magnitude  of  the  line  for  both 
methods.  For  the  annealing  method,  convergence  of  the  lines  of  larger  magnitude  was  slower  but 
for  the  smaller  lines  faster.  The  normalizing  method  was  exactly  the  opposite. 

Both  methods  worked  less  well  with  narrowly  separated  lines.  We  intend  to  report  on  an 
improved  algorithm  to  handle  this  case  at  a  later  stage. 
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